Sequence Comparison Algorithms in Genome Databases
نویسنده
چکیده
We surveyed several important formulations of the sequence database search problem (i.e. local sequence alignment) as well as their solutions. These include the Maximal Segment Pair (MSP) formulation, which has a trivial dynamic programming solution and serve as the basis for popular heuristic solutions such as BLAST 2] and FASTA 18]. The local alignment with gaps formulation as well as a restricted version of the problem is also studied in detailed. Several recent linear and sub-linear time algorithms are studied and compared. A theory for alignment scores based on the MSP formulation is examined. The theory gives a framework to choose the substitution matrix under a particular context. Under the same framework, the statistical signiicance of alignment scores can be assess. BLAST is a fast, and popular heuristic solution that combines ideas from formal approaches. We outline the main algorithm as well as the reasoning behind the choice of the parameters through formal theoretical analysis and empirical evidence.
منابع مشابه
Analysis of string-searching algorithms on biological sequence databases
String-searching algorithms are used to find the occurrences of a search string in a given text. The advent of digital computers has stimulated the development of string-searching algorithms for various applications. Here, we report the performance of all string-searching algorithms on widely used biological sequence databases containing the building blocks of nucleotides (in the case of nuclei...
متن کاملComparative bioinformatics analysis of a wild diploid Gossypium with two cultivated allotetraploid species
Background: Gossypium thurberi is a wild diploid species that has been used to improve cultivated allotetraploid cotton. G. thurberi belongs to D genome, which is an important wild bio-source for the cotton breeding and genetic research. To a certain degree, chloroplast DNA sequence information are a versatile tool for species identification and phylogenetic implications in plants. Different ch...
متن کاملAn infrastructure for comparative genomics to functionally characterize genes and proteins.
Current genome projects are resulting in a flood of sequence data. The interpretation of these sequences is lagging, and optimized data analysis strategies need to be developed. Much can be learned from comparing different genomes, as genomes of distant organisms may still encode proteins with high sequence similarity. The order of genes (co linearity) in genomes may also be conserved to some e...
متن کاملA comparison of algorithms for minimizing the sum of earliness and tardiness in hybrid flow-shop scheduling problem with unrelated parallel machines and sequence-dependent setup times
In this paper, the flow-shop scheduling problem with unrelated parallel machines at each stage as well as sequence-dependent setup times under minimization of the sum of earliness and tardiness are studied. The processing times, setup times and due-dates are known in advance. To solve the problem, we introduce a hybrid memetic algorithm as well as a particle swarm optimization algorithm combine...
متن کاملProtein Databases
Proteins are sources of many peptides with diverse biological activity. Some of them are considered as valuable components of foods and drug targets with desired and designed biological activity. We are now entering an era rich in biological data in which the field of bioinformatics is poised to exploit this information in increasingly powerful ways. There are currently many databases all over ...
متن کاملgpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences
Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007